Font and Size Identification in Telugu Printed Document
نویسنده
چکیده
Telugu is the official language derived from ancient Brahmi script and also one of the oldest and popular languages of India, spoken by more than 66 million people, especially in South India. While a large amount of work has been developed for font and size identification of English and other languages, relatively not much work has been reported on the development of OCR system for Telugu text. Font and size identification of Indian scripts is much more complicated than other scripts because of the use of huge number of combination of characters and modifiers. Font and size identification is the pre-processing step in OCR systems. Hence Telugu font and size identification is an important area if interest in the development of Optical Character Recognition (OCR) system for Telugu script. Pre processing tasks considered here are conversion of gray scale image to binary image, image clearing, and segmentation of the text into line, separation of connected components. Zonal analysis is applied and then top zone components are identified and checked for the presence of tick mark. Aspect ratio and pixel ratio for component are calculated. Comparing these two parameters with the database we identify the font and size. Simulation studies were carried out using MATLAB with GUI for all the segmentation methods for given Telugu text and the results it observed were good enough for identification of different fonts and sizes in the given Telugu text.
منابع مشابه
Multi-font Optical Character Recognition System for Printed Telugu Text
The Telugu OCR systems available in the market currently recognize only the specific fonts of Telugu. This paper describes the development of a multi-font OCR system for printed Telugu characters using Artificial Neural Networks. In this system classification of the characters is carried out using multi layer neural network Architecture.
متن کاملFont and Function Word Identification in Document Recognition
font would be used during recognition. This would reduce An algorithm is presented that identifies the predominant font in which the running text in an English language document the confusion caused by training on many fonts and would is printed. Frequent function words (such as the, of, and, a, effectively reduce the recognition problem to choosing the and to) are also recognized as part of th...
متن کاملFONT DISCRIMINATIO USING FRACTAL DIMENSIONS
One of the related problems of OCR systems is discrimination of fonts in machine printed document images. This task improves performance of general OCR systems. Proposed methods in this paper are based on various fractal dimensions for font discrimination. First, some predefined fractal dimensions were combined with directional methods to enhance font differentiation. Then, a novel fractal dime...
متن کاملIdentification of Telugu, Devanagari and English Scripts Using Discriminating Features
In a multi-script multi-lingual environment, a document may contain text lines in more than one script/language forms. It is necessary to identify different script regions of the document in order to feed the document to the OCRs of individual language. With this context, this paper proposes to develop a model to identify and separate text lines of Telugu, Devanagari and English scripts from a ...
متن کاملAn Adaptive Character Recognizer for Telugu Scripts Using Multiresolution Analysis, Associative Memory
The present work is an attempt to develop a commercially viable and a robust character recognizer for Telugu texts. We aim at designing a recognizer which exploits the inherent characteristics of the Telugu Script. Our proposed method uses wavelet multiresolution analysis for the purpose extracting features and associative memory model to accomplish the recognition tasks. Our system learns the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013